Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: add integration test suite for reindexing #28

Merged
merged 4 commits into from
Feb 26, 2025
Merged

Conversation

conorsch
Copy link
Contributor

Adds integration tests that fetch historical node archives from known URLs, unpack them, and build a reindexer_archive.bin containing all historical blocks for the target chain. Only exercises the penumbra-reindexer archive behavior; doesn't yet exercise the penumbra-reindexer regen functionality. Future work can implement that.

For now, it's enough to have a reproducible process that confirms all historical chain state (for the popular chains like penumbra-1 and penumbra-testnet-phobos-2) can be collected to form a complete picture of event data. This reproducible process does rely on the existence of external data sources, in the form of historical archives in a known format at known URLs. But at least now we can verify that's the case, with known checksums, as well.

Testing and review

You can pull down this branch and run just integration; that'll do needful:

  1. download gzipped tar archives of historical node state
  2. generate an ephemeral node0 directory with key material
  3. clobber generated genesis file with the original genesis file for the target chain
  4. extract archive over node0 directory
  5. run penumbra-reindexer archive to create an sqlite3 database
  6. verify that no gaps are present in the sqlite3 database, i.e. all blocks are accounted for
  7. repeat for multiple chains (currently only penumbra-1 and penumbra-testnet-phobos-2 are handled)

However, be aware that doing so is a disk- and bandwidth-intensive process. On my machine, the process takes ~60-90m, and results in about 500GB of diskspace being used.

By default, the app now logs at INFO level, without requiring an opt-in.
Accordingly, I've dialed down the load frequency on `penumbra-reindexer
archive` to log every 10,000th block, rather than every thousand.
Adds integration tests that fetch historical node archives from
known URLs, unpack them, and build a `reindexer_archive.bin` containing
all historical blocks for the target chain.
@conorsch conorsch requested a review from cronokirby February 16, 2025 21:23
Copy link
Collaborator

@cronokirby cronokirby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really cool!

The PD crashing thing is really a shame, because it would be nice to use the data structure we have representing what the artifacts look like to also figure out what the two integration test splits need to look like, but there's no good way to encode that, unfortunately.

I think subsequently it would be nice to refactor things a bit so that way pointing at artifacts.plinfra.net isn't hardcoded, but that can be a follow up.

@conorsch
Copy link
Contributor Author

The PD crashing thing is really a shame

Let's at least look to removing the sys-exit behavior in pd going forward. I'm skeptical it'll be feasible to backport revising the halt behavior for historical versions, but it's on the table.

to also figure out what the two integration test splits need to look like

My thoughts exactly, it was tempting to make things DRY enough to generate plans from the archive declarations, but we're not quite there yet.

@conorsch conorsch merged commit 589bc85 into main Feb 26, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants